KEY VARIABLES OF INTEREST:
Here is the first pass at the plot.
Here’s some text for V2
Here’s some text for V3
Here’s some text for V4
---
title: "Big Cities Health Inventory Data Visualization"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
social: menu
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(rio)
library(colorblindr)
library(janitor)
library(magrittr)
library(ggrepel)
library(fontawesome)
```
# Background {data-orientation=rows data-icon="fa-info-circle"}
Sidebar {.sidebar}
-------------------------------
**Background**
The [Big Cities Health Coalition](https://twitter.com/bigcitieshealth?lang=en) (BCHC) is a large-scale collaboration among 30 of the largest urban health departments in the United States. See the BCHC's [informational brochure](https://static1.squarespace.com/static/534b4cdde4b095a3fb0cae21/t/5c7fc5cd6e9a7f44b5abf311/1551877582500/BCHC_ABOUT+US.pdf) for more details. You can download the complete dataset [here](http://bchi.bigcitieshealth.org/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBGdz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--c6b5c30fbd8b79859797e1dc260a06064c8f3864/Current%20BCHI%20Platform%20Dataset%20(7-18)%20-%20Updated%20BCHI%20Platform%20Dataset%20-%20BCHI,%20Phase%20I%20&%20II.csv?disposition=attachment), which contains over 30,000 data points across a large variety of health indicators, e.g., behavioral health & substance abuse, chronic disease, environmental health, and life expectancy, to name just a few.
This project includes only a tiny fraction of the available BCHC data, focusing in particular on **obesity rate**, **heart disease mortality rate**, and **opioid-related mortality rate**. Click on the icons to the right for more information on these variables. The goal of this project is provide three data visualizations using these variables and document different iterations of these visualizations.
Row {data-height=600}
-----------------------------------------------------------------------
### Title {.no-title}
Click the image below to access the BCHC data platform:
[](http://www.bigcitieshealth.org/city-data)
### Title {.no-title}
Cities included in the BCHC. Click the map below for more information on city membership.
[](http://www.bigcitieshealth.org/our-members-big-cities-health-coalition-bchc/)
Row {data-height=90}
-----------------------------------------------------------------------
### Title {.no-title}
**KEY VARIABLES OF INTEREST:**
Row {data-height=300}
-----------------------------------------------------------------------
### Title {.no-title}
*Obesity Rate*
[](http://www.bigcitieshealth.org/obesity-physical-activity)
### Title {.no-title}
*Heart Disease Mortality Rate*
[](https://bchi.bigcitieshealth.org/indicators/1834/searches/22955)
### Title {.no-title}
*Opioid-Related Mortality Rate*
[](http://www.bigcitieshealth.org/combatting-opioids)
```{r import data, warning=FALSE}
data_raw <- import("http://bchi.bigcitieshealth.org/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBGdz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--c6b5c30fbd8b79859797e1dc260a06064c8f3864/Current%20BCHI%20Platform%20Dataset%20(7-18)%20-%20Updated%20BCHI%20Platform%20Dataset%20-%20BCHI,%20Phase%20I%20&%20II.csv?disposition=attachment")
# wrangle data
data_filt <- data_raw %>%
clean_names() %>%
select(shortened_indicator_name, year, sex, race_ethnicity, value, place) %>%
filter(shortened_indicator_name %in% c("Adult Physical Activity Levels", "Teen Physical Activity Levels", "Adult Binge Drinking","Adult Obesity","Heart Disease Mortality Rate","Bike Score","Walkability","Median Household Income","Race/Ethnicity","Death Rate (Overall)")) %>%
mutate(value = as.numeric(value)) %>%
mutate_at(c("sex", "race_ethnicity", "place"), factor) %>%
mutate(place = plyr::mapvalues(x = .$place, from = c("Fort Worth (Tarrant County), TX", "Indianapolis (Marion County), IN", "Las Vegas (Clark County), NV", "Miami (Miami-Dade County), FL", "Oakland (Alameda County), CA", "Portland (Multnomah County), OR"), to = c("Fort Worth, TX", "Indianapolis, IN", "Las Vegas, NV", "Miami, FL", "Oakland, CA", "Portland, OR"))) %>%
na.omit()
```
# Obesity x City {data-icon="fa-weight"}
Sidebar {.sidebar}
-------------------------------
**Visualization #1**
This plot represents average obesity rates for adults (18 years and over) across all years in the dataset (2010-2018) for each city. Here, obesity rate refers to the percentage of the population that meets criteria for obesity. In general, obesity is defined in this dataset as [Body Mass Index (BMI)](https://www.cdc.gov/healthyweight/assessing/bmi/index.html) of 30 or greater. This plot includes data for all races and both males and females. The average obesity rate for the entire U.S. is represented by the black bar. States that have higher obesity rates than the national average are colored red, and states below the national average are colored blue. From this plot, it is easy to discern that Detroit, MI had the highest average obesity rate from 2010-2018, while San Francisco, CA had the lowest average obesity rate during this time frame.
This plot is intended for a general audience. See the plots on the right for different iterations of this visualization.
```{r, warning}
# wrangle data
data_obesity <- data_filt %>%
filter(shortened_indicator_name == "Adult Obesity",
sex == "Both",
race_ethnicity == "All") %>%
spread(shortened_indicator_name, value) %>%
group_by(place) %>%
summarise(avg_obesity = mean(`Adult Obesity`, na.rm = TRUE),
sd_obesity = sd(`Adult Obesity`),
n = n()) %>%
mutate(se_obesity = sd_obesity/(sqrt(n)))
```
Column {data-width=650}
-----------------------------------------------------------------------
### Final plot
```{r}
data_obesity %>%
mutate(compare_us_tot = ifelse(
avg_obesity > .$avg_obesity[which(data_obesity$place == "U.S. Total")], "above",
ifelse(avg_obesity < .$avg_obesity[which(data_obesity$place == "U.S. Total")], "below", "avg"))) %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_col(aes(fill = compare_us_tot), alpha = 0.8) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_fill_manual(values = c("#BA4A00", "black", "#ABCFF7")) +
labs(title = "Percent of Adults Who Are Obese", subtitle = "Average obesity rates for each city from 2010-2018", y = "Percent Obese", x = NULL, caption = "Vertical line represents the U.S. average.\n States above/below the U.S. average are colored red/blue, respectively.") +
theme_minimal() +
geom_hline(yintercept = data_obesity$avg_obesity[which(data_obesity$place == "U.S. Total")], linetype = 2) +
theme(legend.position = "none")
```
Column {.tabset data-width=350}
-----------------------------------------------------------------------
### Version 1
```{r}
data_obesity %>%
ggplot(aes(place, avg_obesity, avg_obesity)) +
geom_col() +
coord_flip()
```
> Here is the first pass at the plot.
### Version 2
```{r}
data_obesity %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL) +
theme_minimal()
```
> Here's some text for V2
### Version 3
```{r}
data_obesity %>%
mutate(compare_us_tot = ifelse(
avg_obesity > .$avg_obesity[which(data_obesity$place == "U.S. Total")], "above",
ifelse(avg_obesity < .$avg_obesity[which(data_obesity$place == "U.S. Total")], "below", "avg"))) %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_segment(aes(color = compare_us_tot, x = fct_reorder(place, avg_obesity), xend = place, y=0, yend = avg_obesity), size = 1, alpha = 0.7) +
geom_point(aes(color = compare_us_tot), size = 3, alpha = 0.7) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_color_manual(values = c("#BA4A00", "black", "#ABCFF7")) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL) +
theme_minimal() +
geom_hline(yintercept = data_obesity$avg_obesity[which(data_obesity$place == "U.S. Total")], linetype = 2) +
theme(legend.position = "none")
```
> Here's some text for V3
### Version 4
```{r}
data_obesity %>%
mutate(compare_us_tot = ifelse(
avg_obesity > .$avg_obesity[which(data_obesity$place == "U.S. Total")], "above",
ifelse(avg_obesity < .$avg_obesity[which(data_obesity$place == "U.S. Total")], "below", "avg"))) %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_errorbar(aes(ymin = avg_obesity - 1.96*se_obesity,
ymax = avg_obesity + 1.96*se_obesity),
color = "gray40") +
geom_point(aes(color = compare_us_tot), size = 4, alpha = 0.7) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_color_manual(values = c("#BA4A00", "black", "#ABCFF7")) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL) +
theme_minimal() +
geom_hline(yintercept = data_obesity$avg_obesity[which(data_obesity$place == "U.S. Total")], linetype = 2) +
theme(legend.position = "none")
```
> Here's some text for V4
# Heart Disease x Obesity {data-icon="fa-heartbeat"}
Sidebar {.sidebar}
-------------------------------
**Visualization #2**
See the plots on the right for different iterations of this visualization.
```{r}
# wrangle data
obesity_hdmr <- data_filt %>%
filter(shortened_indicator_name %in% c("Adult Obesity", "Heart Disease Mortality Rate"),
sex == "Both",
race_ethnicity == "All",
place != "U.S. Total") %>%
mutate(i = row_number()) %>%
spread(shortened_indicator_name, value) %>%
group_by(place) %>%
summarize(avg_obesity = mean(`Adult Obesity`, na.rm = TRUE),
avg_hdmr = mean(`Heart Disease Mortality Rate`, na.rm = TRUE))
```
Column {data-width=650}
-----------------------------------------------------------------------
### Final plot
```{r}
## 3 most obese cities
top_3_obese <- obesity_hdmr %>%
top_n(3, avg_obesity)
## 3 least obese cities
bottom_3_obese <- obesity_hdmr %>%
top_n(-3, avg_obesity)
obesity_hdmr %>%
ggplot(aes(avg_obesity, avg_hdmr)) +
geom_point(size = 5, alpha = 0.5, color = "gray70") +
geom_point(data = top_3_obese, size = 5, color = "#BA4A00", alpha = 0.7) +
geom_point(data = bottom_3_obese, size = 5, color = "#ABCFF7", alpha= 0.7) +
geom_smooth(method = "lm", alpha = 0.2, color = "gray60") +
geom_text_repel(data = top_3_obese, aes(label = place), min.segment.length = 0) +
geom_text_repel(data = bottom_3_obese, aes(label = place), min.segment.length = 0) +
theme_minimal() +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
labs(x = "Percent Obese", y = "Heart Disease Mortality Rate", title = "Relationship between Obesity and Heart Disease", subtitle = "State labels represent 3 most/least obese states", caption = "3 most/least obese states are colored red/green, respectively. \n Heart Disease Mortality Rate is age-adjusted per 100,000 people.")
```
Column {.tabset data-width=350}
-----------------------------------------------------------------------
### Version 1
```{r}
obesity_hdmr %>%
ggplot(aes(avg_obesity, avg_hdmr)) +
geom_point() +
geom_smooth(method = "lm")
```
### Version 2
```{r}
obesity_hdmr %>%
ggplot(aes(avg_obesity, avg_hdmr)) +
geom_point() +
geom_smooth(method = "lm") +
geom_text_repel(aes(label = place)) +
theme_minimal()
```
# Opioid Deaths x Gender {data-icon="fa-tablets"}
Sidebar {.sidebar}
-------------------------------
**Visualization #3**
See the plots on the right for different iterations of this visualization.
```{r}
# wrangle data
data_opioid <- data_raw %>%
clean_names() %>%
select(shortened_indicator_name, year, sex, race_ethnicity, value, place) %>%
filter(shortened_indicator_name %in% c("Opioid-Related Overdose Mortality Rate")) %>%
mutate(value = as.numeric(value)) %>%
mutate_at(c("sex", "race_ethnicity", "place"), factor) %>%
na.omit()
# identify city with highest opioid-related overdose mortality rate from 2010 to 2016
top_opioid = data_opioid %>%
filter(sex == "Both",
race_ethnicity == "All",
place != "U.S. Total",
year %in% 2010:2016) %>%
unique() %>%
spread(shortened_indicator_name, value) %>%
group_by(place) %>%
summarize(mean_opioid = mean(`Opioid-Related Overdose Mortality Rate`, na.rm = TRUE)) %>%
top_n(1) %>%
select(place)
```
Column {data-width=650}
-----------------------------------------------------------------------
### Final plot
```{r}
data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year %in% 2010:2016) %>%
spread(shortened_indicator_name, value) %>%
ggplot(aes(year, `Opioid-Related Overdose Mortality Rate`, color= sex)) +
geom_line(size= 2) +
geom_point(size = 4) +
labs(x = NULL, y = "Opioid-Related Overdose Mortality Rate", title = "Opioid-use Related Mortality Rates Over Time", subtitle = "Colombus, OH", caption = "Rates are age-adjusted per 100,000 people.") +
theme_minimal() +
scale_color_OkabeIto() +
theme(legend.position = "none") +
geom_label(data = data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year == 2016) %>%
spread(shortened_indicator_name, value),
aes(y =`Opioid-Related Overdose Mortality Rate`, label = sex),
nudge_x = -0.7,
size = 5)
```
Column {.tabset data-width=350}
-----------------------------------------------------------------------
### Version 1
```{r}
data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year %in% 2010:2016) %>%
spread(shortened_indicator_name, value) %>%
ggplot(aes(year, `Opioid-Related Overdose Mortality Rate`, color= sex)) +
geom_line()
```
### Version 2
```{r}
data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year %in% 2010:2016) %>%
spread(shortened_indicator_name, value) %>%
ggplot(aes(year, `Opioid-Related Overdose Mortality Rate`, color= sex)) +
geom_line(size= 2) +
geom_point(size = 4) +
labs(x = NULL, y = "Opioid-Related Overdose Mortality Rate", title = "Opioid-use Related Mortality Rates Over Time", subtitle = "Colombus, OH", caption = "Rates are age-adjusted per 100,000 people.") +
theme_minimal()
```